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NUMERICAL COMPUTATION OF SENSITIVITIES AND THE ADJOINT APPROACH 


ROBERT MICHAEL LEWIS * 


Abstract. We discuss the numerical computation of sensitivities via the adjoint approach in optimization 
problems governed by differential equations. We focus on the adjoint problem in its weak form. We show how 
one can avoid some of the problems with the adjoint approach, such as deriving suitable boundary conditions 
for the adjoint equation. We discuss the convergence of numerical approximations of the costate computed 
via the weak form of the adjoint problem and show the significance for the discrete adjoint problem. 

Key words. Adjoint approach, costate, sensitivities. 

Subject classification. Applied and Numerical Mathematics 

1. Introduction. In this paper we discuss the numerical computation of sensitivities via the adjoint 
approach in optimization problems governed by differential equations. We focus on the weak form of the 
adjoint problem. The weak form of the adjoint problem allows one to finesse the issue of identifying the 
adjoint problem as a conventional boundary- value, initial- value, or initial- boundary- value problem. We also 
discuss how one can use the weak form of the adjoint problem to compute numerical approximations of the 
costate and relate this approach to the discrete adjoint problem. 

The context for this discussion is the nonlinear programming problem 


(i.i) 


minimize F(a) = /(a, u(a)) 
subject to C(a) — c(a,u(a)) > 0, 


where u(a ) is the solution of some set of differential equations, 


(1.2) S(a, u(a)) = 0. 

We will refer to a as the design variable and u as the state variable. For simplicity, we restrict our attention 
to the question of computing derivatives associated with the objective F(a) and ignore the constraints C(a). 

The adjoint approach [3, 11, 14] allows one to compute the derivative F'(a ) of F(a) with respect to a in 
a very efficient manner. The primary cost in computing F'(a) via the adjoint approach is the calculation of 
an intermediate quantity A, called the costate or adjoint state, as the solution of the adjoint problem, which 
is a linear problem associated with the governing equation (1.2). 

However, the adjoint approach is not without attendant difficulties. These difficulties are particularly 
associated with the calculation of the intermediate quantity A. One problem that arises is that of determining 
the appropriate adjoint problem. It is not always the case that one can easily identify the adjoint problem as 
some manner of conventional initial- value, boundary- value, or initial-boundary- value problem. Objectives in 
optimization problems for which one cannot identify the appropriate adjoint problem in a straightforward 
way are sometimes called “inadmissible.” 

*ICASE, Mail Stop 403, NASA Langley Research Center, Hampton, Virginia 23681-0001, buckaroo® lease. edu. This research 
was supported by the National Aeronautics and Space Administration under NASA Contract No. NAS1-19480 while the author 
was in residence at the Institute for Computer Applications in Science and Engineering (ICASE), NASA Langley Research 
Center, Hampton, VA 23681-0001. 
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The problem of inadmissibility is reported, for instance, in [2, 6], where the authors conclude that only 
certain objectives arc admissible in aerodynamic optimization problems. The difficulty with the “inadmis- 
sible” cost functionals encountered by these authors stems from the absence of suitable boundary terms to 
cancel terms that appear due to the choice of cost functional. Another question associated with the adjoint 
approach is that of the convergence of numerical approximations of the costate A. Finally, from the point 
of view of numerical optimization, there is the paramount question of the approximation of the derivative 
F'(a) and the convergence of such approximations under refinement of the discretization of (1.1)- (1.2). 

We discuss how studying the adjoint problem in its weak form gives us one way to address these issues. 
The weak form of the adjoint problem is based on the identity of the costate A as a linear functional. 

The weak form of the adjoint problem always exists, which allows us to resolve the problem of “inad- 
missibility.” Furthermore, we can solve the weak form of the adjoint problem numerically and can derive 
convergence estimates for A that depend on the convergence estimates for the solution of the forward problem 
(1.2). While the approximations of A may converge in a very weak sense, the ensuing convergence estimates 
for A nevertheless enable us to derive useful convergence estimates for approximations of the derivative F f (a). 
Wc also relate the numerical approximation of the weak form of the adjoint problem to the adjoint of the 
discretized problem. 

2. Issues in the numerical calculation of sensitivities. We begin with a discussion of some of 
the issues in the numerical calculation of sensitivities for optimization problems governed by differential 
equations. 

2.1. Consistency of sensitivity calculations. We assume that the discretization of u (and possibly 
a) is parameterized by h, where the discretization is refined as h — + 0. There are two senses of consistency 
for numerical approximations of sensitivities for the problem (1.1)- (1.2). 

Consistency with the infinite- dimensional problem . As the discretization is refined, the approximation 
A* 1 of the costate and the approximation F f h {a) of the derivative should converge in a suitable sense to the 
correct quantities. One would also hope for estimates of the rate of convergence. 

Consistency with the finite- dimensional computational problem . At any level of discretization ft, the ap- 
proximation of the derivative F' (a) should correspond to the derivative of the finite- dimensional optimization 
problem obtained by discretizing (1.1)— (1.2) at level h. 

The necessity of the first sense of consistency is clear; if the numerical scheme is to make sense then it 
should converge under refinement of the discretization. The necessity of the second sense of consistency is 
subject to debate. One could, for instance, derive the adjoint problem and discretize it in a manner indepen- 
dent of the discretization of (1.1) (1.2). Asymptotically the sensitivities derived by these two approaches will 
be consistent (provided both are consistent with the infinite- dimensional problem), which might lead one to 
conclude that consistency of derivatives with the discretized optimization problem is unnecessary. However, 
inconsistencies at the discrete level may cause an optimization algorithm applied to the discretized version 
of the optimization problem to terminate prematurely. 

This is not so much an issue in the case of unconstrained optimization. For instance, consider the situa- 
tion in Fig. 2.1, in which a (E IR 2 is the current value of the design variables, and — g(a) is an approximation 
to the negative gradient — VF(o) at a. In the case of unconstrained minimization, — g(a ) is well within the 
cone of descent directions for F and is a reasonable approximation of — VF(a). The analysis in [5] of the 
use of inexact gradient information in trust region methods for unconstrained minimization indicates that 
optimization algorithms for unconstrained minimization are surprisingly insensitive to such errors. 
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-VF(a) 


Fig. 2.1. Acceptability of errors in the derivative in the unconstrained case . 

However, the situation changes if we add even the simplest of constraints. Consider the previous situation 
with the addition of two bound constraints that are binding at a, as depicted in Fig. 2.2. In this picture 

i 


Nn (a) -g(a) 



Fig. 2.2. Unacceptability of errors in the derivative in the constrained case. 

the feasible region is denoted by ft. Now the approximation —g (a) lies in the normal cone of ft at a, Nn(a)- 
This would lead us to the erroneous conclusion that a is a Karush- Kuhn- Tucker point (i.e., a constrained 
stationary point), since it appears that at a we cannot improve F locally without violating the constraints. 
In the constrained problem, the cone of feasible descent directions is not determined just by — VJF(a) but 
also by the constraint geometry, and in general this cone is much smaller than in the case of unconstrained 
minimization. As a consequence, in constrained optimization even small errors in the derivative can lead to 
problems. 

Thus small errors in the gradient can lead to serious problems in the application of optimization al- 
gorithms. Consistency of the numerical approximations of sensitivities with the discretized optimization 
problem is one step to reducing the presence of such errors. From a practical point of view, moreover, 
consistency of derivatives with the discretized problem makes it easier to check the correctness of their 
implementation. 

In the case the finite- dimensional computational problem, the desired sense of consistency is clear: 
sensitivities should be numerically consistent (i.e., consistent to the extent allowed by the limitations of 
machine precision). Consistency with the infinite-dimensional problem, that is, the sense in which X h and 
F f h {a) should converge to A and F'(a), is more subtle. 

Both A and F r (a ) are linear functionals; let (A, u) denote the value obtained by applying the linear 
functional A to the vector v. One typically posits that the action of A can be expressed as 

( 2 . 1 ) (A, v) = j A(x)i;(a;) 

where A (a:) is a function sufficiently smooth to permit integration by parts against the linearization of the 
state equation (1.2). One then attempts to to identify A as the solution of an adjoint differential equation. 
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If this can be done, one can then apply the convergence analysis of the schemes for the solution of this 
differential equation to obtain estimates of the convergence of an approximation \ h to A. However, there is 
a problem with this approach. An arbitrary numerical scheme for discretizing the adjoint equation will not 
necessarily be numerically consistent with the discretized problem. We will return to this point in §4.1. 

2.2. Deriving the adjoint problem. Moreover, it is not always the case that we can write A in the 
form (2.1). A related problem is that of “inadmissible” objectives, mentioned in the introduction, where one 
cannot identify A as the solution of a differential equation. The following example illustrates both points. 
For a G IR, let 


u'(x) = a x G (0, 1) 

u{ 0 ) = 0 

and define F(a) = u'( 1); note that F(a ) = a. If we try to apply the adjoint method in the usual way to 
compute F f (a ), beginning with (2.1) and integrating by parts, we obtain 

f A (x)u(x) ~ A(l)u(l) — f \'(x)u(x) = u'( 1). 

Jo Jo 

for all u G C^O, 1], say. Now we are stuck; if we require A to satisfy the adjoint ODE — A'(x) = 0 for 
x G (0, 1) we obtain the requirement 


A(l)ti(l) = u'(l), 

for all u G C^ 1 [0, 1] and no choice for the terminal condition A(l) will insure this. The problem is that 
A = 5'(1) and this is supported at x = 1, where we also need to impose a terminal condition for A, which 
makes A over-determined at x = 1. Also note that A is not a function, so the assumption (2.1) is not, strictly 
speaking, justified. On the other hand, the objective F(a) — a is unequivocally a differentiable function of 
a single real variable, so any obstruction must be illusory. 

2.3. Representation of linear functionals and directions of steepest descent. Related to the 
question of convergence of numerical approximations to A and F'(a) is the question of just what such 
approximations are converging to. In a very real sense, linear functionals such as F f (a) and A have no 
intrinsic identity other than their values when applied to their arguments. 

Consider, for instance, the familiar 5-functional defined on the Sobolev space H l [— 1 , 1] by (6, u) = u(0), 
This functional is bounded on H l [— 1,1], We can represent the action of 5 in many different ways. For 
instance, we can write 5 as the limit of the sequence 

(2.2) S k (x ) = fcxfe(x), 

where Xk is the characteristic function of the interval [-1/2A;, l/2Ar]: 

(5, u) = n(0) = lim / S k (x)u(x)dx. 

We can also represent 6 as integration against a measure // defined by 

fO\ / 0 x< t Q 
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Then 


{£, u) = j u(x)dfi(x). 


Furthermore, since 8 is a bounded linear functional on if 1 [—1,1], the Riesz representation theorem [9] 
guarantees the existence of w G H l [— 1, 1] such that in the H l inner product (• , *) H i, 


(' s > u) = {w > u) H i[_i i] 



u>(x)u(a;) + w f (x)u f (x). 


In this case one can check that 


(2*3) 


j{x) = l 


■ e 2 e x 


2(1 + e 2 ) 
e x - e 2 e“ x 

2(1 + e 2 ) 


-1 < x < 0 

0 < x < 1 


Thus we see that 8 has no intrinsic identity save (J, it) = it(0). While 8 can be represented by the 
reasonable function w(x ) in (2.3), 8 can also be viewed as the limit of the sequence in (2.2), which is not 
convergent in L 2 and pointwise is converging to zero almost everywhere. 

Given that it is not clear what it means to “look at” A and F'(a) as linear functionals, how then arc 
these quantities to be interpreted? The notion of a direction of steepest provides one answer to this question. 

The direction of steepest descent for F with respect to the norm || ■ \\ x is a solution of 


^ minimize ( F’(a ), p } 

subject to || p \\ x < 1, 

provided a solution exists. Note that this direction depends on the choice of norm; the direction of steepest 
descent tells us the direction that infinitesimally yields the greatest decrease in F per unit distance , and 
distance depends on the choice of norm. Since the direction of steepest descent is a direction in X, it gives 
us something meaningful to examine (e.g., actually plot). We would replace F'(a) by A in (2.4) if A were the 
functional of interest. 

Since directions of steepest descent are directions in the design space, they can be used in optimization 
algorithms. For instance, in the usual adjoint approach, one hopes to obtain a representation of the form 

(F'(o), p) = J gp 

for g e L 2 . The Cauchy- Schwarz inequality then says that g defines the direction of steepest descent with 
respect to the L 2 norm. 

On the other hand, there is no L 2 direction of steepest descent for the ^-functional because 5 is not 
bounded on L 2 . At the same time, the representer w of 5 in the H 1 inner product given in (2.3) defines 
(again by the Cauchy- Schwarz inequality) the direction of steepest descent for 6 in the H 1 norm. The 
computation of directions of steepest descent in the H k norm is discussed in [10]. 

2.4. Further remarks on the representation of the costate. Actually, the assumption that A can 
be written in the form (2.1) is, in general, almost true in a sense we can make precise. This observation 
depends on the density of functionals of the form (2.1) in the negative norm Sobolev spaces [1, 13] and other 
spaces. We will use this observation in §4.1 in connection with numerical schemes for computing A. 

The Riesz representation theorem allows us to identify H k with its dual (i.e., the space of bounded linear 
functionals on H k ): given any linear functional t on H k we can find w € H k for which (£, u) = (w , u) Rk for 
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all u e H k . The negative norm space H~ k is an alternative representation of the dual of H k . For w e L 2 , 
the negative Sobolev norm is defined to be 



The negative norm Sobolev space H k is defined to be the completion of Z> 2 with respect to this norm. 

This means that functionals of the form (2.1) are dense in H~ so if A £ H~ k then A can be approximated 
arbitrarily well (in H k norm) by functionals of the form (2.1). Functionals of the form (2.1) are dense in 
many other dual spaces, as well, such as ( C k )'. Thus we should not be surprised that the assumption (2.1) 
turns out to be correct as often as it does. On the other hand, the examples of the previous sections show 
that A need not have such a representation. 

3. The weak form of the adjoint problem. We now turn from these general questions concerning 
the calculation of sensitivities to the adjoint problem in its weak form. We give two derivations of the weak 
form of the adjoint problem. The first is based on implicit differentiation, while the second is based on the 
variation of the Lagrangian. 

W/e assume that a £ X , u £ {/, and S : (a, u) £ AT x U •— > S(a, u) £ V, where AC, U ) V are Banach spaces. 
We also assume that / is differentiable on X x U, and that given (a, u ) for which S(a, u) = 0 (i.e., u solves 
(1.2)), the partial Frechet derivative of S with respect to u, S u (a, u), is boundedly invertible as a map from 
U to V. 

We denote by AC', U\ and V' the dual spaces of AT, U, V (i.e., the spaces of bounded linear functionals on 
X,U,V). Wc denote the adjoint [9] of a bounded linear operator A by A * ; if A : Y — > Z, then A x : Z f -+Y* 
is defined by (A x z r , y) Y = (z\ Ay) z for all v' £ V\ 

3.1. Derivation via implicit differentiation. Under the preceding hypotheses, the classical implicit 
function theorem [8] assures us that 5(a,w(a)) = 0 defines u(a) as a smooth function of a and that the 
derivative of u(a) with respect to a is 


(3.1) 


du 

da 


(a) = -5~ 1 5' a 


(a,ti(a)) 


This formula for the Jacobian of u with respect to a is formally just the result of applying implicit differen- 
tiation to S(a, u(a)) = 0. The derivative of F is then 


(3.2) 


F’(a) =/a + /«T 


— fa fuS u 1 S a 


da l(a,ti(a)) 

At this point the costate A £ V* appears. Define 


! (a,u(o)) 


(3.3) X = —fuS" 1 ] 

then F r (a) = f a 4* A5 a , where the right-hand side is evaluated at (a, it (a)). 

Next we derive the weak form of the adjoint problem and make clear the role adjointness plays. From 

(3.3) , A £ V f satisfies 

( 3 - 4 ) <AS U , y) v = - (/«, v) u% 

for all v £ U. However, by definition 


(3.5) 


(A S U} v) v — (A, S u is) v — (S x A, v) u , 
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so 


(3.6) \ = — (S u )~ x f u . 

Note that S~ x : f u E U' — (5 U )“ X / u € V 9 . 

From (3.4) and (3.5) we obtain 

(3.7) (A, S u v) v = — (/ u , v)u 

for all v E U . This relation we call the weak form of the adjoint problem since it defines A only through the 
action of A as a linear functional on vectors in its domain. 

The weak form of the adjoint problem always exists and is a distributional differential equation. Any 
boundary conditions are implicit. Such equations are discussed in [12], for instance. There remains the 
question of relating the weak form of the adjoint problem to a conventional differential equation. As we 
have noted before, usually one can identify an adjoint differential equation (and boundary conditions) that 
is equivalent to (3.7); however, we have also seen that this not always the case. The interpretation of 
distributional differential equations in terms of conventional differential equations arises more generally 
in the solution of differential equations with distributional data and is not always possible [12]. On the 
other hand, the weak form of the adjoint problem always exists, avoiding the problem of “inadmissibility” 
mentioned previously. 

We can relate these formulae to F'(a), beginning with (3.2). Given 7} € X, 

<F'(a),7 l)x = (fa + f^,v) x 

= (1. faV)nt + (/«» = + (da /“> ^) x • 

or just 

(3.8) F'(a) = / x l + ^*/ u . 

From (3.1) we obtain 

(3.9) ^*/u = -S x (S u r x / u . 

Then (3.6), (3.8), and (3.9) yield 

(3-10) F'(a) = f* 1 - <5 x (S u )~ x f u = / X 1 + S X A. 

In the finite- dimensional case the first identity in (3.10) corresponds to transposing (3.2) to obtain VF(a) = 
V a f-SZS; T V u f. 

3.2. Derivation via variation of the Lagrangian. We can also derive the weak form of the adjoint 
problem by examining the variation of the Lagrangian due to variations in a. This approach, another form 
of the calculations of the preceding section, begins with the Lagrangian L(a, u; A) = /(a, u) + (A, £(a, u )} . 
One typically assumes that the action of the linear functional A can be represented as an integral of the 
form (2.1), but as discussed in §2 this may not be justified. For that reason we adhere to the abstract 
representation (A, S(a, u)) for the action of A on the equation residual S(a, u). 
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The effect SL of a variation 5a on the Lagrangian L is given by 

5L — Sf -f- (A, SS) = f a Sa + fudu -r (A, S a 5a -f- S u 5u ) , 

where to first-order the perturbation 5u in u due to the perturbation 5a satisfies S a 5a+S u 5u = 0. Rearranging 
terms we have 


5L = / a <Ja + (A, S a 5a) + + (A, S u 5u ) . 


If we require A to satisfy 

(3.11) f u 5u + (A, S u 5u ) = 0 

for all perturbations 6u , then F'(a)<5<2 = / a <5a -f (A, 5 a Ja) for all Ja, which determines F'(a). 

Typically one insures that (3.11) holds for all 5u by starting with the representation (2.1) for the action 
of A and applying integration by parts to derive an adjoint problem independent of 5u , which, if solved by 
A, implies that (3.11) holds for all 5u. However, as noted previously the assumption that the action of A 
can be written in the form (2.1) for some suitable function \(x) is not always justified. This explains some 
of the difficulties with “inadmissible” objectives that have been reported - the costate A in those problems 
does not have the hypothesized form (2.1), and the adjoint method reaches an impasse. 

On the other hand, if we take as the definition of the adjoint problem the weak formulation (3.11) 
(which is equivalent to (3.7)), we can avoid these problems since we do not presume a priori any particular 
representation for the action of A. Moreover, this definition via duality sheds light on the convergence of 
numerical schemes for the approximation of A and F'(a), as we discuss in the next section. 

4. Convergence of numerical schemes for the solution of the weak form of the adjoint 
equation. First we discuss how suitable numerical solutions A^ 1 of the weak form of the adjoint problem 
converge to A, establishing consistency with the infinite-dimensional problem. Wc then give a concrete 
example of such a scheme that is also consistent with the finite-dimensional optimization problem. 

Let L = S u . We assume that for any v wc can solve the linearized problem Lw = v and that we 
approximate solutions w by w h £ U h } where h is a discretization parameter and U h is an approximating 
space. 

In the discretization of the adjoint problem, wc require that A* 1 satisfy 

<V\ Lw h ) = {ft w h ) 

for all w h £ U h . This defines the action of X h on the range V h of U h under L. There remains the definition 
of A* 1 on the complement of V h , which we do as follows. For v £ V 1 let w h be the approximate solution of 
Lw = v at the discretization level h. Then we require 

(A\ v) = (fl w h ). 

This agrees with the preceding definition of \ h on V h . Moreover, 

(A h , Lw) = {ft ™ h ) = <A h , Lw h ) , 

so we have the adjoint Galerkin condition: 

(4.1) (\ h , Lw - Lw h ) = 0 
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for all w, w h . 

Given v E V, wc can find w E U for which Lw = v. Then 

(A - A ft , v) = < X-X h , Lw) 

= (A, Lw - Lw h ) + (A - A\ Lw*) + {X h , Lw h - Lw) 
= ( fu , W - w h ) + (f u - /£, w h ) + (A' 1 , Lio'* - Lw) . 


Applying (4.1) reduces this to 


(4.2) 


(A - A\ v) = (f u , w-w h ) + (f u - fl w h ) . 


This is the basic identity for approximating the error A — A*; we will give examples of its use in §5. 

From this discussion we see that the error in the approximation of A is governed by how well we approx- 
imate f u and by the rate of convergence of w h to w for solutions of Lw = v. Approximation of f u is not 
trivial: unless / is linear in u, f u will involve the solution of the forward problem (1.2). We will also sec in 
§5 that the sense in which w h must approximate w need not be as strong as the norm on U. 

These estimates guarantee convergence in what might be only a very weak norm, such as a negative 
Sobolev norm; this sense of convergence may be odd. For instance, cos kx — > 0 in H -1 [0,27 t] as k — ► oo. 
The possibility of convergence of numerical approximations of A in a weak sense is not a purely theoretical 
possibility, either; an example of this phenomenon in optimal control is discussed in [4]. Nevertheless, we 
can still use the weak form of the adjoint problem to obtain meaningful approximations of F'(a). 


4.1. The convergence of the discrete adjoint approach. The discrete adjoint problem is one 
practical instance of the abstract approached outlined in the preceding section. Furthermore, the discrete 
adjoint problem is consistent with the discretized forward problem. 

For this discussion we assume that we are approximating the forward problem (and its linearization) by a 
Galerkin finite element method. Extension to the more general Petrov-Galerkin methods is straightforward. 

Let {(f>i , - * ■ be the basis for the finite element space U h ) where N = N(h). We assume that the 
linearized operator L = S u is a map L : H m — ■* k > 0, and that f u is bounded on H m , That is, U — H k , 
U f — H~ k , V = H m , and V f = H~ m . The case k < 0 can be handled along the lines wc discuss here, but 
requires a slightly different argument. For brevity we omit this discussion. 

From the discussion in §2.4, we know that we can approximate A (in H~ k norm) by functionals of the 
form 

(A\ v) - J X h (x)v(x), 

where A^(x) e L 2 . In particular, we can approximate A by functionals X h of this form for which A^(x) € U h \ 

X h {x) = Y j X , l^{x). 

i= 1 


If we do so, then (3.7) becomes 


N 


(4.3) 


SEW / <t>i(x)L<f>t{x) = J^wt (fu, <t>j) ■ 
i=i j= l J j = l 


The standard matrix associated with the finite element method has appeared. Let 

a-ij = J <t>i(x)L<t>j(x). 
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Then the matrix A = (a»j) is the stiffness matrix from the finite element method. If we define 

\ h = (\ h 1 ,---,\ h N ) T 

Wh = (u>i,v,U>a, r) T 

/« = (</£. *>,•■•,</£, 4>n)) t , 

we can rewrite (4.3) as 

(4.4) \ h Aw h = f u w h . 

If (4.4) is to hold for all w h , we must have 

(4.5) A T X h = /„. 

This we recognize as the discrete adjoint, where we take the transpose (in the usual sense of linear algebra) 
of the linearized forward equation. This is the system that would result if we were to treat the finite element 
basis coefficients as the state variables in the optimization problem at level h. 

The discrete adjoint is thus consistent with both the finite- dimensional discretized problem and the 
infinite-dimensional problem. However, we stress that this latter consistency is in the sense of convergence 
as linear functionals. If A can be represented as the solution of a differential equation, there is no guarantee 
that the finite- dimensional transpose of the scheme for the solution of the linearized forward problem will 
be consistent (in the usual sense of finite- difference and finite element methods) with the adjoint differential 
equation. 

This stronger sense of consistency is known not to hold in general; [16] gives a counterexample. There 
the authors examine upwind differencing and show how transposition of the discretization of the linearized 
forward problem yields a numerical scheme for the adjoint problem that does not necessarily inherit either 
the consistency or order of accuracy (in the usual sense of finite-difference schemes) of the finite- difference 
scheme for the linearized forward problem. Given the correspondence of upwind differencing and the lowest- 
order discontinuous Galerkin methods [7] one may be able to adapt this counterexample to the finite element 
setting as well. 

The example in [16] is not at odds with our discussion. We are interested in consistency (i.e., convergence) 
of \ h and A in the sense of linear functionals. As we have discussed, this may be weaker than the convergence 
of A* 1 to A as functions (e.g., pointwisc convergence or convergence in L 2 ). It is this latter sense of consistency 
that is discussed in [16]. 

4.2. Consequences for the calculation of sensitivities. In this section we discuss the consequences 
of the preceding analysis for the calculation of F f (a). From (3.10) we have 

F'(a ) - F^(a) = ( f a - / a h ) * 1 + S* (A - A h ) + (S a - S*) x A\ 

Since || T x || = || T [| for any bounded linear operator T, 

|| F'(a) - Fh (a) || < || /„ - / 0 h || + || S Q || || A - A h || + \\S a - || || A* || . 

Estimates on || A — X h || and || X h || then apply as part of the bound on the error in approximating F'(a). 
However, the approximation of F f (a) also depends on how well we approximate f a and S a , so the approxi- 
mation of A may be only part of the story of approximating ^'(a). 
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Also note that the error A — X h appears in the term S* (A — X h ) in F'(a) — F f h {a ), and 

(S a x (A - A*) , r,) = (A - X h , S aV ) = </„ - /*, L~ l S a r]) . 

Only the error in A restricted to the range R = S a X C V plays a role in the approximation of F'(a). A 
bound on || A — X h || would consider the error over all of V. It might be possible to use this observation 
suggests it to obtain sharper estimates on the error F f {a) — F^a) in certain situations (say, if L~ l R were to 
consist of very smooth functions). 

5. Illustration. We close our discussion with the following illustrative example. Let ft be the square 
{{pc i ^ 2 ) | “1 <:ci,2 C2<1} and consider the problem 


(5.1) 


-A u = a in ft 
u = 0 on dCt. 


The design variable is a = a{x). The objectives we consider are convex quadratic functions of a, so the 
optimization problems arc trivial. However, these objective serve to illustrate the points we wish to make 
about computing the costate and derivative of the objective. 

Since (5.1) is linear in u, the linearization L = S u has the same form: Lw = v is given by 


(5.2) 


Aw = v in ft 
w = 0 on 9ft. 


We also have S a r] = —77 or S a = —I. 

We will approximate A using the discrete adjoint approach, as discussed in 4.1. Applying piecewise linear 
elements to (5.1) and (5.2) and assuming a reasonable mesh, we have the estimate 


and a similar estimate for w ) w h . 

The first objective we consider is 


** || L . < Ch 2 II « II*. 




where ip is a datum we wish to match. We will assume that a £ L 2 \ then u £ H 2 and 5, L : H 2 —* L 2 . 
Consequently, we expect A £ L 2 . 

We have 


{fu,w-w h )= f (u-ip)(w-w h ) 

Jo 


from which we obtain 


(fu, W - w h ) | < II U-lf> \\ L 2 1| w - w h 


L 2 < Ch 2 || V II ia 


Observe that convergence of w h to w in L 2 suffices and we do not need convergence in H l or H 2 . 
Meanwhile, assuming ip £ U h for simplicity, we have 


(fu - fu, ) = j (w - U h ) W h , 


(fu - fu, U> k ) | < || « - U h 


It 2 


Hia < Ch 2 II v I L 2 • 
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From these estimates and (4.2) we arrive at 


|| A - X h \\ L2 < Ch 2 . 

We obtained this result via a general argument using the weak form of the adjoint problem. We could 
also have arrived at this estimate by deriving the adjoint problem explicitly and examining its solutions. On 
the other hand, the objective 

F(a) = \ | Vu(0) | 2 

is more easily handled by the weak form of the adjoint problem. Objectives of this sort arise when trying to 
match data measured only at points. 

In this case we assume a € C 2 (0). Then u € H 4 and F(a) is defined. Continuing with piecewise linear 
finite elements, for any e € (0, 1) we have [15] for some C e depending on e:, 

|| Vu - Vu h || Loo <C £ h e \\a\\ Laa 

|| Vto - Vw h \\ Loo < C £ h e || v || Loo < C e h‘ || » ||„ 2 . 

Taking care to keep x = 0 interior to an element, we have 

| </» - fu, w) | = | (Vu(0) - Vu») T Viu(O) | 

<C\\Vu- Vu h || iao || w || H2 < C'h' || a || t . || « || L2 

and 

| </ u , w - w h ) | = | Vu(0) T (Vtn(0) — Vrt;^(0)) | 

< II Vu \\ Lbb || Vw - Vw h || ioo < C e , u h e || u || H2 . 

From these bounds and (4.2) we arrive at 

|| A -A' 1 || H . 3 <C £ h'. 

This holds despite the fact that A is not even in L 2 : formally, 

— AA = (d Xl u( 0)d Xl -f d X2 u(0)d X2 ) 5 in 

A — 0 on 

We can write A in terms of the fundamental solution for the Laplacian: 

(5.3) A(ar) = -t (9 Xl u(0)3 Xl + d X2 u(0)d X2 ) log | x \ + r(x), 

li r 

where r is smooth. The singular leading term is not in L 2 ; one can check that A € H~ l \ L 2 . 

Finally, we can estimate the error in F'(a). Note that f a — 0 and S a = = — /, so 

|| F\a) - Fi(a) || (C2y = || A — A h || tf _, < C e h*. 

Since F f (a ) = A ^ C 2 we cannot take (5.3) as defining a gradient or descent direction for the optimization 
problem. One would need to compute a direction of steepest descent with respect to a suitable norm as 
discussed in §2.3; see [10] for a further discussion of this point. 
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6. Conclusion. We have discussed some of the important analytical and computational issues in nu- 
merical approaches to the adjoint problem with an emphasis on the systematic approach to the adjoint 
problem via its weak form and its numerical solution. In the process we have shown how solutions of the 
discrete adjoint problem are suitably consistent with both the finite-dimensional and infinite-dimensional 
optimization problems. From numerical approximations of F ; (a) we can compute the directions of steepest 
descent and quasi-Newton directions needed for nonlinear programming algorithms. 

One topic that deserves closer attention is the case where the governing equations are solved using finite- 
difference and finite- volume methods. These cases are of particular interest since they are most effective 
when the state u is smooth (except possibly for well-defined loci of discontinuity). The smoother the class 
of possible u the larger the class of possible cost functionals, and thus the broader the possibilities for A and 
F'(a). 

A related question is the effect of numerical errors due to quadrature and other approximations. De- 
pending on the locality of the interesting features of A, these errors (and other “variational crimes” discussed 
in [17]) may be significant. The interpretation of some finite- difference and finite- volume methods in terms 
of Galerkin and Petrov- Galcr kin finite element schemes with particular quadrature rules may allow one to 
relate results for finite elements to finite- difference and finite- volume methods (and vice versa). 

7. Acknowledgments. The author wishes to thank David Keyes for answering many questions con- 
cerning finite element methods. 
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